Search CORE

12 research outputs found

Interpreting Neural Policies with Disentangled Tree Representations

Author: Hasani Ramin
Rus Daniela
Seyde Tim
Wang Tsun-Hsuan
Xiao Wei
Publication venue
Publication date: 12/11/2023
Field of study

The advancement of robots, particularly those functioning in complex human-centric environments, relies on control solutions that are driven by machine learning. Understanding how learning-based controllers make decisions is crucial since robots are often safety-critical systems. This urges a formal and quantitative understanding of the explanatory factors in the interpretability of robot learning. In this paper, we aim to study interpretability of compact neural policies through the lens of disentangled representation. We leverage decision trees to obtain factors of variation [1] for disentanglement in robot learning; these encapsulate skills, behaviors, or strategies toward solving tasks. To assess how well networks uncover the underlying task dynamics, we introduce interpretability metrics that measure disentanglement of learned neural dynamics from a concentration of decisions, mutual information and modularity perspective. We showcase the effectiveness of the connection between interpretability and disentanglement consistently across extensive experimental analysis

arXiv.org e-Print Archive

Solving Continuous Control via Q-learning

Author: Gilitschenski Igor
Riedmiller Martin
Rus Daniela
Schwarting Wilko
Seyde Tim
Werner Peter
Wulfmeier Markus
Publication venue
Publication date: 25/09/2023
Field of study

While there has been substantial success for solving continuous control with actor-critic methods, simpler critic-only methods such as Q-learning find limited application in the associated high-dimensional action spaces. However, most actor-critic methods come at the cost of added complexity: heuristics for stabilisation, compute requirements and wider hyperparameter search spaces. We show that a simple modification of deep Q-learning largely alleviates these issues. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods when learning from features or pixels. We extend classical bandit examples from cooperative MARL to provide intuition for how decoupled critics leverage state information to coordinate joint optimization, and demonstrate surprisingly strong performance across a variety of continuous control tasks

arXiv.org e-Print Archive

Learning to Plan via Deep Optimistic Value Exploration

Author: Karaman Sertac
Rus Daniela L
Schwarting Wilko
Seyde Tim
Publication venue
Publication date: 01/05/2020
Field of study

Deep exploration requires coordinated long-term planning. We present a model-based reinforcement learning algorithm that guides policy learning through a value function that exhibits optimism in the face of uncertainty. We capture uncertainty over values by combining predictions from an ensemble of models and formulate an upper confidence bound (UCB) objective to recover optimistic estimates. Training the policy on ensemble rollouts with the learned value function as the terminal cost allows for projecting long-term interactions into a limited planning horizon, thus enabling deep optimistic exploration. We do not assume a priori knowledge of either the dynamics or reward function. We demonstrate that our approach can accommodate both dense and sparse reward signals, while improving sample complexity on a variety of benchmarking tasks. Keywords: Reinforcement Learning; Deep Exploration; Model-Based; Value Function; UCBOffice of Naval Research; Qualcomm; Toyota Research Institut

DSpace@MIT

Locomotion Planning through a Hybrid Bayesian Trajectory Optimization

Author: Carius Jan
Farshidian Farbod
Grandia Ruben
Hutter Marco
Seyde Tim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Locomotion planning for legged systems requires reasoning about suitable contact schedules. The contact se- quence and timings constitute a hybrid dynamical system and prescribe a subset of achievable motions. State-of-the- art approaches cast motion planning as an optimal control problem. In order to decrease computational complexity, one common strategy separates footstep planning from motion optimization and plans contacts using heuristics. In this paper, we propose to learn contact schedule selection from high- level task descriptors using Bayesian Optimization. A bi-level optimization is defined in which a Gaussian Process model predicts the performance of trajectories generated by a motion planning nonlinear program. The agent, therefore, retains the ability to reason about suitable contact schedules, while explicit computation of the corresponding gradients is avoided. We delineate the algorithm in its general form and provide results for planning single-legged hopping. Our method is capable of learning contact schedule transitions that align with human intuition. It performs competitively against a heuristic baseline in predicting task appropriate contact schedules

Repository for Publications and Research Data

Inclusion of Angular Momentum During Planning for Capture Point Based Walking

Author: Bertrand Sylvain
Englsberger Johannes
Griffin Robert
Pratt Jerry
Seyde Tim
Shrivastava Apoorv
Publication venue
Publication date: 01/05/2018
Field of study

When walking at high speeds, the swing legs of robots produce a non-negligible angular momentum rate. To accommodate this, we provide a reference trajectory generator for bipedal walking that incorporates predicted centroidal angular momentum at the planning stage. This can be done efficiently as the Centroidal Moment Pivot (CMP), Instantaneous Capture Point (ICP) and the center of mass (CoM) all have closedform trajectory solutions due to their linear dynamics. This is then used to produce smooth, continuous trajectories. We furthermore provide a lightweight model to estimate angular momentum as induced during leg swing of the gait cycle. Our proposed trajectory generator is tested thoroughly in simulation and has been shown to successfully operate on the real hardware

Institute of Transport Research:Publications

Crossref

Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks

Author: Gilitschenski Igor
Karaman Sertac
Rus Daniela
Sander Ryan
Schwarting Wilko
Seyde Tim
Publication venue
Publication date: 17/05/2022
Field of study

Experience replay plays a crucial role in improving the sample efficiency of deep reinforcement learning agents. Recent advances in experience replay propose using Mixup (Zhang et al., 2018) to further improve sample efficiency via synthetic sample generation. We build upon this technique with Neighborhood Mixup Experience Replay (NMER), a geometrically-grounded replay buffer that interpolates transitions with their closest neighbors in state-action space. NMER preserves a locally linear approximation of the transition manifold by only applying Mixup between transitions with vicinal state-action features. Under NMER, a given transition's set of state action neighbors is dynamic and episode agnostic, in turn encouraging greater policy generalizability via inter-episode interpolation. We combine our approach with recent off-policy deep reinforcement learning algorithms and evaluate on continuous control environments. We observe that NMER improves sample efficiency by an average 94% (TD3) and 29% (SAC) over baseline replay buffers, enabling agents to effectively recombine previous experiences and learn from limited data.Comment: Accepted to L4DC 202

arXiv.org e-Print Archive

Inclusion of Angular Momentum During Planning for Capture Point Based Walking

Author: Bertrand Sylvain
Englsberger Johannes
Griffin Robert
Pratt Jerry
Seyde Tim
Shrivastava Apoorv
Publication venue
Publication date: 01/05/2018
Field of study

Institute of Transport Research:Publications

Good Posture, Good Balance: Comparison of bio-inspired and model-based approaches for posture control of humanoid robots

Author: Henze Bernd
Hettich Georg
Lippi Vittorio
Mergner Thomas
Ott Christian
Roa Maximo A.
Seyde Tim Niklas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/03/2016
Field of study

This article provides a theoretical and thorough experimental comparison of two distinct posture control approaches: 1) a fully model-based control approach and 2) a biologically inspired Approach derived from human observations. While the robotic approach can easily be applied to balancing in three-dimensional (3-D) and multicontact (MC) situations, the biologically inspired balancer currently only works in two-dimensional situations but shows interesting robustness properties under time delays in the feedback loop. This is an important feature when considering the signal transmission and processing properties in the human sensorimotor system. Both controllers were evaluated in a series of experiments with a torque-controlled humanoid robot (TORO). This article concludes with some suggestions for the improvement of model-based balancing approaches in robotics

Institute of Transport Research:Publications